Recursive data mining for role identification in electronic communications

نویسندگان

  • Vineet Chaoji
  • Apirak Hoonlor
  • Boleslaw K. Szymanski
چکیده

We present a text mining approach that discovers patterns at varying degrees of abstraction in a hierarchical fashion. The approach allows for certain degree of approximation in matching patterns, which is necessary to capture non-trivial features in realistic datasets. Due to its nature, we call this approach Recursive Data Mining (RDM). We demonstrate a novel application of RDM to role identification in electronic communications. We use a hybrid approach in which the RDM discovered patterns are used as features to build efficient classifiers. Since we want to recognize a group of authors communicating in a specific role within an Internet community, the challenge is recognize possibly different roles of an author within different communication communities. Moreover, each individual exchange in electronic communications is typically short, making the standard text mining approaches less efficient than in other applications. An example of such a problem is recognizing roles in a collection of emails from an organization in which middle level managers communicate both with superiors and subordinates. To validate our approach we use the Enron dataset which is such a collection. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs well in role identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recursive Data Mining for Author and Role Identification

Like paintings and verbal dialogues, written documents exhibit the author’s distinctive style and identification of the author of an anonymous document is an important and challenging task in computer security. Even more challenging is identification of a style of a group of diverse individuals acting in similar circumstances, like authors writing in certain literary period or people writing in...

متن کامل

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Modelling Customer Attraction Prediction in Customer Relation Management using Decision Tree: A Data Mining Approach

In Today’s quality- based competitive world, known as knowledge age, customer attraction is of ultimate importance. In respect to the slogan “customer is always right”, customer relation management is the core of an organizational strategy playing an important role in four aspects of customer identification, customer attraction, customer retaining, and customer satisfaction. Commercial organiza...

متن کامل

Data Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences

 Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...

متن کامل

Analyzing and Investigating the Use of Electronic Payment Tools in Iran using Data Mining Techniques

In today's world, most financial transactions are carried out using done through electronic instruments and in the context of the Information Technology and Internet. Disregarding the application of new technologies at this field and sufficing to traditional ways, will result in financial loss and customer dissatisfaction. The aim of the present study is surveying and analyzing the use of elect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Hybrid Intell. Syst.

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2010